-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve logs when there is a timeout error #1946
Conversation
@hty690, your company's legal contact has approved your signed contributor license agreement. It will also be reviewed by VMware, but the merge can proceed. |
@@ -576,6 +576,11 @@ func (w *watcher) watch() { | |||
klog.Warningf("Failed to start watch for %s: %v", w.objectType, err) | |||
return | |||
} | |||
_, ok := watcher.(*watch.StreamWatcher) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comment for this handling for others to understand.
The unit test failure failed because of this. It should be because the fake client used in test is not a StreamWatcher. Maybe your original approach that compares the type of watch with watch.NewEmptyWatch()
can resolve this but it looks a little implementation intrusive (maybe current approach is same).
@antoninbas what's your opinion here? The context is, the Watch method does not return error if it's a timeout error or "ProbableEOF" and return watch.NewEmptyWatch()
instead, that's why you saw no log about connection problem in the issue. See https://github.com/kubernetes/kubernetes/blob/823fa75643bdc352a51cbbc83b55420734832eda/staging/src/k8s.io/client-go/rest/request.go#L702-L709
cmd/antrea-agent/agent.go
Outdated
} | ||
return fmt.Errorf("Some watchers may not be connected") | ||
}) | ||
apiServer.GenericAPIServer.AddHealthChecks(check) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HealthCheck will be added to livezChecks too, what the issue asked was watcher's connection state should be reflected in readiness probe.
@hty690 please add description to the commit message and PR description about what else the PR does, e.g. improving logs when there is a timeout error, and don't add trailing period to the title of commit message and PR. See https://github.com/golang/go/wiki/CommitMessage. |
Codecov Report
@@ Coverage Diff @@
## main #1946 +/- ##
==========================================
+ Coverage 64.35% 67.48% +3.13%
==========================================
Files 193 195 +2
Lines 16967 16882 -85
==========================================
+ Hits 10919 11393 +474
+ Misses 4899 4305 -594
- Partials 1149 1184 +35
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@hty690, VMware has approved your signed contributor license agreement. |
@@ -576,7 +576,12 @@ func (w *watcher) watch() { | |||
klog.Warningf("Failed to start watch for %s: %v", w.objectType, err) | |||
return | |||
} | |||
|
|||
//Make sure that watcher is not the type of watch.NewEmptyWatch() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//Make sure -> // Makesure
4f74539
to
e8b73e2
Compare
@@ -584,6 +585,11 @@ func (w *watcher) watch() { | |||
klog.Warningf("Failed to start watch for %s: %v", w.objectType, err) | |||
return | |||
} | |||
// Makesure that watcher is not the type of watch.NewEmptyWatch() | |||
if reflect.TypeOf(watcher) == reflect.TypeOf(watch.NewEmptyWatch()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare the latter as a variable to avoid unnecessary construction of emptyWatch everytime we need to compare.
pkg/agent/apiserver/apiserver.go
Outdated
if npq.GetControllerConnectionStatus() { | ||
return nil | ||
} | ||
return fmt.Errorf("Some watchers may not be connected") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returned error shouldn't be capitalized
@@ -584,6 +585,11 @@ func (w *watcher) watch() { | |||
klog.Warningf("Failed to start watch for %s: %v", w.objectType, err) | |||
return | |||
} | |||
// Makesure that watcher is not the type of watch.NewEmptyWatch() | |||
if reflect.TypeOf(watcher) == reflect.TypeOf(watch.NewEmptyWatch()) { | |||
klog.Warningf("Failed to start watch for %s. Something wrong with the connection?", w.objectType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.Warningf("Failed to start watch for %s. Something wrong with the connection?", w.objectType) | |
klog.Warningf("Failed to start watch for %s. Please ensure antrea service is reachable for the agent", w.objectType) |
pkg/agent/apiserver/apiserver.go
Outdated
@@ -97,6 +99,14 @@ func New(aq agentquerier.AgentQuerier, npq querier.AgentNetworkPolicyInfoQuerier | |||
if err != nil { | |||
return nil, err | |||
} | |||
// Add readiness probe to check the status of watchers. | |||
check := healthz.NamedCheck("watcher", func(_ *http.Request) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you move this to newConfig
given that function initializes the whole config?
The agent logs will print the failure message when the connection to the controller is timeout. Also adding readiness probe to remind the users of the connectivity issue. Fixes antrea-io#822
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
@antoninbas could you check if this PR addesses the original issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to test this and it meets expectations, except for the readiness probe failing with reason withheld
which I am unsure about.
// Add readiness probe to check the status of watchers. | ||
check := healthz.NamedCheck("watcher", func(_ *http.Request) error { | ||
if npq.GetControllerConnectionStatus() { | ||
return nil | ||
} | ||
return fmt.Errorf("some watchers may not be connected") | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when I test the probe, I get [-]watcher failed: reason withheld
reason withheld
is not what I'd expect here, shouldn't it say some watchers may not be connected
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Antonin Bas <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
The agent logs will print the failure message when the connection to
the controller is timeout. Also adding HealthCheck to remind
the users of the connectivity issue.
Fixes #822